Effects of COVID-19 pandemic on hospital-aquired infections
BMIN503/EPID600 Final Project
Author
Kevin Mears
Use this template to complete your project throughout the course. Your Final Project presentation will be based on the contents of this document. Replace the title/name above and text below with your own, but keep the headers. Feel free to change the theme and other display settings, although this is not required.
1 Overview
Give a brief a description of your project and its goal(s), what data you are using to complete it, and what two faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.
The goal of my final project is to determine how the COVID-19 pandemic and the surge in hospital visits, changes in cleaning/hygiene, and people staying at home affected the incidence of hospital-associated infections. I identified the 2020 data set from the CDC National Hospital Care Survey (NHCS) which contains 132,694 inpatients and 388,753 emergency department visits. It contains information including patient age, sex, the month they were discharged, and their diagnoses. I will use this data to determine how the incidence of hospital-associated infections changed month-to-month in 2020 and whether there are correlations with other parameters (e.g. COVID-19 diagnosis).
I spoke with Dr. Joseph Zackular (Assistant Professor of Pathology and Laboratory Medicine at CHOP) who is a bacteriologist and an expert on C. difficile, the most common hospital-acquired infection. He suggested I stratify the dataset based on sex and age as demographics that might affect hospital infections. He suggested I look at bacterial pneumonia as a positive control as it is commonly associated with COVID co-infection.
I also spoke with Dr. Kyle Bittinger (Bioinformatics Laboratory Director, CHOP Microbiome Center) who generously offered to help with the code where needed.
Describe the problem addressed, its significance, and some background to motivate the problem. This should extend what is in the Section 1.
Explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff.
Hospital-acquired infections (HAI) are infections acquired after hospital admission. It includes catheter-associated urinary tract infections, central line-associated blood stream infections, surgical site infections, ventilator-associated and hospital-acquired pneumonia, and Clostridioides difficile infections. The risk of HAIs depend on many factors, including the facility’s disinfection and infection prevention practices, the patient’s immune status, length of stay in the hospital, co-morbitities, ventilator support, and use of invasive procedures. Receipt of antibiotics is one of the major risk factors for developing Clostridioides difficile infection and other multidrug resistant bacterial infections (e.g. Vancomycin resistant enteroccus or Methicillin resistant Staphylococcus aureus). According to the CDC, about 4% of hospitalized patients experience at least one HAI, with an estimated 648,000 cases in 2011; these are dominated by pneumonia, surgical site infections, gastrointestinal infections, UTIs, and bloodstream infections. Clostidioides difficile infection is the most common cause of hospital acquired infections and can cause severe, potentially life-threatening colitis. Common causes of hospital-associated and ventilator-associated pneumonia are Staphylococcus aureus, Pseudomonas aeruginosa, E. coli, and Klebsiella penumoniae. Common causes of catheter-associated UTIs are Enterococcus, Staphylcoccus aureus, Pseudomonas, Proteus, Klebsiella, and Candida. Common causes of surgical site infections include Staphylococcus aureus, other Staphylococcus, Enterococcus, E. coli, Pseudomonas aeruginosa, Enterobacter, and Klebsiella.
The COVID-19 pandemic led to a surge in hospital visits, overwhelming our healthcare system and led to over a million deaths. Coincidently, disinfection practices intensified in public facilities including hospitals. Many causes of hospital-associated infections, including C. difficile, MRSA, VRE, norovirus are difficult to disinfect using traditional cleaning methods. The present study aims to determine how the COVID-19 pandemic affected the incidence of hospital-associated infections in 2020.
This is an interdisciplinary question because it requires an understanding of microbiology and the various factors that affect pathogenesis, epidemiology to appreciate the how disease is spread in a hospital setting, and data science for bioinformatics analysis of hospital survey data.
Sources:
Monegro et al. Hospital-Acquired Infections. (2023). In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing. https://www.ncbi.nlm.nih.gov/books/NBK441857/
Dewey et al. Increased Use of Disinfectants During the COVID-19 Pandemic and Its Potential Impacts on Health and Safety. (2021) Acs Chem Health Safety.
Dancer, S.J. Controlling Hospital-Acquired Infection: Focus on the Role of the Environment and New Technologies for Decontamination. (2014) Clin Microbiol Rev.
3 Methods
Describe the data used and general methodological approach used to address the problem described in the Section 2. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.
library("tidyverse")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("ggplot2")library("cowplot")
Attaching package: 'cowplot'
The following object is masked from 'package:lubridate':
stamp
# Read in NHCS 2020 inpatient Public Use File R Dataset#https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ip_r.rdsurl <-"https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ip_r.rds"nhcs2020ip <-read_rds(url)
# inpatient data set#pull out variables listvar_2020 <- nhcs2020ip$variables#select useful columnsvar_2020_select <-select(var_2020, 1:71)#variable names varnames_2020 <-colnames(var_2020_select)#pull out primary diagnosis (DX1)diag1_2020 <- var_2020_select$DX1#determine the number of primary C. diff infections (ICD-10 diagnosis code: A047)primaryCdiff_2020 <-sum(diag1_2020 =="A047", na.rm =TRUE)primaryCdiff_2020
# A tibble: 9 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 3178
2 Left against medical advice 74
3 Transfer to short term facility 83
4 Transfer to long term facility 249
5 Home health care 854
6 Hospice care - home or medical facility 178
7 Other 880
8 Dead 875
9 <NA> 78
#visualizationggplot(covid_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("COVID-19 cases by sex") +theme_bw()
ggplot(covid_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("COVID-19 cases by age") +theme_bw()
ggplot(covid_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("COVID-19 cases by length of stay") +theme_bw()
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_count()`).
#greater than 15 binned in dataggplot(covid_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("COVID-19 cases by length of stay greater than 30 days") +theme_bw()
ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("COVID-19 cases by discharge month") +theme_bw()
#subset bacterial pneumonia patientsbactpneumo_patients <-filter(alldiag_2020, DX %in%c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160"))#J13 Pneumonia due to Streptococcus pneumoniae#J14 Pneumonia due to Hemophilus influenzae#J150 Pneumonia due to Klebsiella pneumoniae#J151 Pneumonia due to Pseudomonas#J1520 Pneumonia due to staphylococcus, unspecified#J15211 Pneumonia due to Methicillin susceptible Staphylococcus aureus#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus#J1529 Pneumonia due to other staphylococcus#J153 Pneumonia due to streptococcus, group B#J154 Pneumonia due to other streptococci#J155 Pneumonia due to Escherichia coli#J1561 Pneumonia due to Acinetobacter baumannii#J1569 Pneumonia due to other Gram-negative bacteria#J157 Pneumonia due to Mycoplasma pneumoniae#J158 Pneumonia due to other specified bacteria#J159 Unspecified bacterial pneumonia#J160 Chlamydial pneumonia
# A tibble: 9 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 567
2 Left against medical advice 11
3 Transfer to short term facility 21
4 Transfer to long term facility 175
5 Home health care 263
6 Hospice care - home or medical facility 69
7 Other 254
8 Dead 306
9 <NA> 6
#visualizationggplot(bactpneumo_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("Bacterial penumonia cases by sex") +theme_bw()
ggplot(bactpneumo_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("Bacterial penumonia cases by age") +theme_bw()
ggplot(bactpneumo_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("Bacterial penumonia cases by length of stay") +theme_bw()
#greater than 15 binned in dataggplot(bactpneumo_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("Bacterial penumonia cases by length of stay greater than 30 days") +theme_bw()
ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Bacterial penumonia cases by discharge month") +theme_bw()
# A tibble: 9 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 380
2 Left against medical advice 9
3 Transfer to short term facility 18
4 Transfer to long term facility 76
5 Home health care 205
6 Hospice care - home or medical facility 30
7 Other 193
8 Dead 74
9 <NA> 7
#visualizationggplot(Cdiff_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("C. difficile infection by sex") +theme_bw()
ggplot(Cdiff_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("C. difficile infection by age") +theme_bw()
ggplot(Cdiff_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("C. difficile infection by length of stay") +theme_bw()
#greater than 15 binned in dataggplot(Cdiff_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("C. difficile infection by length of stay greater than 30 days") +theme_bw()
ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("C. difficile infection by discharge month") +theme_bw()
#MRSAmrsa_patients <-filter(alldiag_2020, DX %in%c("A4102", "A4902", "B9562", "J15212"))#A4102 Sepsis due to Methicillin resistant Staphylococcus aureus#A4902 Methicillin resistant Staphylococcus aureus infection, unspecified site#B9562 Methicillin resistant Staphylococcus aureus infection as the cause of diseases classified elsewhere#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus# no MRSA cases in dataset
#enterococcusentero_patients <-filter(alldiag_2020, DX %in%c("A4181", "B952"))#A4181 Sepsis due to Enterococcus#B952 Enterococcus as the cause of diseases classified elsewhere
#plot number of cases by demographics#re-level ageentero_patients <- entero_patients |>mutate(age.simplified =cut(AGE, breaks =c(0, 5, 10, 15, 20, seq(30, 100, by =10)), right =FALSE, labels =c("less than 5", "5-9", "10-14", "15-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90+")))count(entero_patients, age.simplified)
# A tibble: 8 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 160
2 Left against medical advice 4
3 Transfer to short term facility 6
4 Transfer to long term facility 46
5 Home health care 161
6 Hospice care - home or medical facility 27
7 Other 116
8 Dead 22
#visualizationggplot(entero_patients, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("Enterococcus infections by sex") +theme_bw()
ggplot(entero_patients, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("Enterococcus infections by age") +theme_bw()
ggplot(entero_patients, aes(x = LOS)) +geom_bar() +labs(x ="Length of stay (up to 14 days)", y ="Count") +ggtitle("Enterococcus infections by length of stay") +theme_bw()
#greater than 15 binned in dataggplot(entero_patients, aes(x = LOS_30DAYS)) +geom_bar() +labs(x ="Length of stay greater than 30 days", y ="Count") +ggtitle("Enterococcus infections by length of stay greater than 30 days") +theme_bw()
ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Enterococcus infections by discharge month") +theme_bw()
#infections from catheterinfcatheter_patients <-filter(alldiag_2020, DX %in%c("T80211A", "T80211D ", "T80211S", "T80212A", "T80212D", "T80212S", "T80218A", "T80218D", "T80218S", "T80219A", "T80219D", "T80219S"))#T80211A Bloodstream infection due to central venous catheter, initial encounter#T80211D Bloodstream infection due to central venous catheter, subsequent encounter#T80211S Bloodstream infection due to central venous catheter, sequela#T80212A Local infection due to central venous catheter, initial encounter#T80212D Local infection due to central venous catheter, subsequent encounter#T80212S Local infection due to central venous catheter, sequela#T80218A Other infection due to central venous catheter, initial encounter#T80218D Other infection due to central venous catheter, subsequent encounter#T80218S Other infection due to central venous catheter, sequela#T80219A Unspecified infection due to central venous catheter, initial encounter#T80219D Unspecified infection due to central venous catheter, subsequent encounter#T80219S Unspecified infection due to central venous catheter, sequela# no catheter-associated infections in dataset
# ventilator-associated pneumoniaventpenumo_patients <-filter(alldiag_2020, DX =="J95851")#J95851 Ventilator associated pneumonia# no cases in dataset
# surgical site infectionssurginf_patients <-filter(alldiag_2020, DX %in%c("T8141XA", "T8141XD", "T8141XS", "T8142XA", "T8142XD", "T8142XS", "T8143XA", "T8143XD", "T8143XS", "O8600", "O8601", "O8602", "O8603", "O8604", "08609"))#T8141XA Infection following a procedure, superficial incisional surgical site, initial encounter#T8141XD Infection following a procedure, superficial incisional surgical site, subsequent encounter#T8141XS Infection following a procedure, superficial incisional surgical site, sequela#T8142XA Infection following a procedure, deep incisional surgical site, initial encounter#T8142XD Infection following a procedure, deep incisional surgical site, subsequent encounter#T8142XS Infection following a procedure, deep incisional surgical site, sequela#T8143XA Infection following a procedure, organ and space surgical site, initial encounter#T8143XD Infection following a procedure, organ and space surgical site, subsequent encounter#T8143XS Infection following a procedure, organ and space surgical site, sequela#O8600 Infection of obstetric surgical wound, unspecified#O8601 Infection of obstetric surgical wound, superficial incisional site#O8602 Infection of obstetric surgical wound, deep incisional site#O8603 Infection of obstetric surgical wound, organ and space site#O8604 Sepsis following an obstetrical procedure#O8609 Infection of obstetric surgical wound, other surgical site# no cases in dataset
#plot month data together covid_month <-ggplot(covid_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotcovid_month <- covid_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))bactpneumo_month <-ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotbactpneumo_month <- bactpneumo_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))cdiff_month <-ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()#adjust margins for cowplotcdiff_month <- cdiff_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))entero_month <-ggplot(entero_patients, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +theme_bw()entero_month <- entero_month +theme(plot.margin =margin(t =20, r =10, b =10, l =10))plot_grid(covid_month, bactpneumo_month, cdiff_month, entero_month, labels =c('COVID-19', 'Bacterial pneumonia', 'C. difficile', 'Enterococcus'), label_size =12, label_x =-0.05, label_y =1)
Bacterial pneumonia follows a similar pattern to COVID-19 as expected, but C. difficile and Enterococcus infections appears to be pretty consistent throughout the year.
There were no cases of infections associated with surgeries, infections from catheter use, methicillin-resistant Staphylococcus aureus infections, or ventilator-associated pneumonia in the dataset.
# proportions by discharge month and ageggplot(covid_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("COVID-19 cases") +theme_bw()
ggplot(bactpneumo_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("Bacterial pneumonia cases") +theme_bw()
ggplot(Cdiff_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("C. difficile infections") +theme_bw()
ggplot(entero_patients, aes(x = DISCHARGE_MONTH, fill = age.simplified)) +geom_bar(position ="fill") +scale_fill_brewer(palette ="Spectral") +scale_x_continuous(breaks =1:12) +labs(x ="Discharge Month", y ="Proportion", fill ="Age") +ggtitle("Enterococcus infections") +theme_bw()
# Add descriptions to diagnosis# Read the lines from the filelines <-readLines("./2021-code-descriptions-tabular-order/icd10cm_codes_2021.txt")# Split each line by 2 or more spacessplit_lines <-strsplit(lines, "\\s{2,}")# Find the maximum number of columnsmax_length <-max(sapply(split_lines, length))# Pad shorter rows with NApadded_lines <-lapply(split_lines, function(x) {length(x) <- max_length # Extend the lengthreturn(x) # Returns the padded line})# Convert the list to a data frameicd_code <-do.call(rbind, lapply(padded_lines, function(x) as.data.frame(t(x), stringsAsFactors =FALSE)))# Optionally, set column names if neededcolnames(icd_code) <-c("DX", "Description")alldiag_desc <-left_join(alldiag_2020, icd_code)
Joining with `by = join_by(DX)`
# diagnosis arranged by number of casesalldiag_desc |>group_by(Description) |>summarize(count =n()) |>arrange(desc(count))
# A tibble: 3,814 × 2
Description count
<chr> <int>
1 <NA> 3217815
2 Essential (primary) hypertension 30922
3 Hyperlipidemia, unspecified 27947
4 Acute kidney failure, unspecified 18190
5 Gastro-esophageal reflux disease without esophagitis 17444
6 Single live birth 12200
7 Encounter for immunization 11723
8 Major depressive disorder, single episode, unspecified 11503
9 Anxiety disorder, unspecified 11406
10 Hypo-osmolality and hyponatremia 10649
# ℹ 3,804 more rows
# determine whether discharge month is associated with COVID # filter DISCHARGE_MONTH and DX columnsdx_month <- alldiag_desc |>select(c(DISCHARGE_MONTH, DX))# Remove NA in DXdx_month_narm <- dx_month[!is.na(dx_month$DX), ]# Remove NA in Discharge Monthdx_month_narm <- dx_month_narm[!is.na(dx_month_narm$DISCHARGE_MONTH), ]# create new column with COVID infection as factorcovid_binary_month <- dx_month_narm |>mutate(COVID =ifelse(DX =="U071", 1, 0))# Fit the logistic modelcovid.month.fit <-glm(COVID ~ DISCHARGE_MONTH, data = covid_binary_month, family = binomial)summary(covid.month.fit)
Call:
glm(formula = COVID ~ DISCHARGE_MONTH, family = binomial, data = covid_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.500450 0.034265 -189.71 <2e-16 ***
DISCHARGE_MONTH 0.142463 0.003886 36.66 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83171 on 1498051 degrees of freedom
Residual deviance: 81710 on 1498050 degrees of freedom
AIC: 81714
Number of Fisher Scoring iterations: 8
# The positive coefficient for discharge_month (0.142) and P-value of less than 2e-16 indicates a significant positive association between discharge month and covid diagnosis (as expected).
# determine whether discharge month is associated with bacterial pneumonia bactpneumo_code <-c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160")# create new column with COVID infection as factorbactpneumo_binary_month <- dx_month_narm |>mutate(bactpneumo =ifelse(DX %in% bactpneumo_code, 1, 0))# Fit the linear modelbactpneumo.month.fit <-glm(bactpneumo ~ DISCHARGE_MONTH, data = bactpneumo_binary_month, family = binomial)summary(bactpneumo.month.fit)
Call:
glm(formula = bactpneumo ~ DISCHARGE_MONTH, family = binomial,
data = bactpneumo_binary_month)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.582895 0.049489 -133.017 < 2e-16 ***
DISCHARGE_MONTH -0.033551 0.006973 -4.811 1.5e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 26074 on 1498051 degrees of freedom
Residual deviance: 26051 on 1498050 degrees of freedom
AIC: 26055
Number of Fisher Scoring iterations: 9
# The negative coefficient for discharge_month (-0.03) and P-value of less than 0.001 indicates a very significant negative association between discharge month and bacterial pneumonia.
# determine whether primary (DX1) COVID diagnosis is associated with other hospital-associated infectionsalldiag_covid_binary <- var_2020 |>mutate(COVID =ifelse(rowSums(select(var_2020, 21:50) =="U071", na.rm =TRUE) >0, 1, 0))
# Fit the linear modelcovid.bactpneumo.fit <-glm(bactpneumo ~ COVID, data = alldiag_bactpneumo_binary, family = binomial)summary(covid.bactpneumo.fit)
Call:
glm(formula = bactpneumo ~ COVID, family = binomial, data = alldiag_bactpneumo_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.2344 0.1049 -68.99 < 2e-16 ***
COVID 1.6850 0.2262 7.45 9.33e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1865.7 on 132693 degrees of freedom
Residual deviance: 1826.3 on 132692 degrees of freedom
AIC: 1830.3
Number of Fisher Scoring iterations: 10
# a positive coefficient (1.69) and a p-value less than 0.001 indicates that there is a very strong positive assocition between bacterial pneumonia and COVID dianosis.
# determine whether COVID is associated with C. difficile colitisalldiag_cdiff_binary <- alldiag_covid_binary |>mutate(Cdiff =ifelse(rowSums(select(alldiag_covid_binary, 21:50) =="A047", na.rm =TRUE) >0, 1, 0))alldiag_cdiff_factor <- alldiag_cdiff_binary %>%mutate(COVID =factor(COVID, levels =c(0, 1), labels =c("Negative", "Positive")),Cdiff =factor(Cdiff, levels =c(0,1), labels =c("Negative", "Positive")))# Create the plot with labeled factorsggplot(alldiag_cdiff_factor, aes(x = Cdiff, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="C. difficile infection", y ="Proportion", fill ="COVID Status")
# Fit the linear modelcovid.cdiff.fit <-glm(Cdiff ~ COVID, data = alldiag_cdiff_binary, family = binomial)summary(covid.cdiff.fit)
Call:
glm(formula = Cdiff ~ COVID, family = binomial, data = alldiag_cdiff_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.89046 0.03270 -149.548 <2e-16 ***
COVID 0.03812 0.14568 0.262 0.794
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11690 on 132693 degrees of freedom
Residual deviance: 11690 on 132692 degrees of freedom
AIC: 11694
Number of Fisher Scoring iterations: 7
# no association between C. diff colitis and COVID-19 diagnosis
# determine whether COVID diagnosis is associated with Enterococcus infectionentero_code <-c("A4181", "B952")alldiag_entero_binary <- alldiag_covid_binary |>mutate(Entero =ifelse(rowSums(select(alldiag_covid_binary, 21:50) == entero_code, na.rm =TRUE) >0, 1, 0))alldiag_entero_factor <- alldiag_entero_binary %>%mutate(COVID =factor(COVID, levels =c(0, 1), labels =c("Negative", "Positive")),Entero =factor(Entero, levels =c(0,1), labels =c("Negative", "Positive")))# Create the plot with labeled factorsggplot(alldiag_entero_factor, aes(x = Entero, fill = COVID)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Negative"="#aaaaaa", "Positive"="#b20000")) +theme_bw() +labs(x ="Enterococcus infection", y ="Proportion", fill ="COVID Status")
# Fit the linear modelcovid.entero.fit <-glm(Entero ~ COVID, data = alldiag_entero_binary, family = binomial)summary(covid.entero.fit)
Call:
glm(formula = Entero ~ COVID, family = binomial, data = alldiag_entero_binary)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.21056 0.06293 -98.686 <2e-16 ***
COVID 0.21349 0.25810 0.827 0.408
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3873.6 on 132693 degrees of freedom
Residual deviance: 3873.0 on 132692 degrees of freedom
AIC: 3877
Number of Fisher Scoring iterations: 9
# no association between Enterococus infection and COVID-19 diagnosis
# Read in NHCS 2020 emergency department R Dataseturl2 <-"https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHCS/2020/R/nhcs2020ed_r.rds"nhcs2020ed <-read_rds(url2)
# emergency department dataset#pull out variables listvar_2020_ed <- nhcs2020ed$variables#select useful columnsvar_2020_ed_select <-select(var_2020_ed, 1:37)#variable names varnames_2020_ed <-colnames(var_2020_ed_select)#pull out primary diagnosis (DX1)diag1_2020_ed <- var_2020_ed_select$DX1#determine the number of primary C. diff infections (ICD-10 diagnosis code: A047)primaryCdiff_2020_ed <-sum(diag1_2020_ed =="A047", na.rm =TRUE)primaryCdiff_2020_ed
[1] 151
#there were 151 cases of primary C.diff infection into the emergency department
# A tibble: 9 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 7395
2 Left against medical advice 80
3 Transfer to short term facility 145
4 Transfer to long term facility 114
5 Home health care 347
6 Hospice care - home or medical facility 121
7 Other 995
8 Dead 440
9 <NA> 485
#visualizationggplot(covid_patients_ed, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("COVID-19 cases by sex") +theme_bw()
ggplot(covid_patients_ed, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("COVID-19 cases by age") +theme_bw()
ggplot(covid_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("COVID-19 cases by discharge month") +theme_bw()
#subset bacterial pneumonia patientsbactpneumo_patients_ed <-filter(alldiag_2020_ed, DX %in%c("J13", "J14", "J150", "J151", "J1520", "J15211", "J1529", "J153", "J154", "J155", "J1561", "J1569", "J157", "J158", "J159", "J160"))#J13 Pneumonia due to Streptococcus pneumoniae#J14 Pneumonia due to Hemophilus influenzae#J150 Pneumonia due to Klebsiella pneumoniae#J151 Pneumonia due to Pseudomonas#J1520 Pneumonia due to staphylococcus, unspecified#J15211 Pneumonia due to Methicillin susceptible Staphylococcus aureus#J15212 Pneumonia due to Methicillin resistant Staphylococcus aureus#J1529 Pneumonia due to other staphylococcus#J153 Pneumonia due to streptococcus, group B#J154 Pneumonia due to other streptococci#J155 Pneumonia due to Escherichia coli#J1561 Pneumonia due to Acinetobacter baumannii#J1569 Pneumonia due to other Gram-negative bacteria#J157 Pneumonia due to Mycoplasma pneumoniae#J158 Pneumonia due to other specified bacteria#J159 Unspecified bacterial pneumonia#J160 Chlamydial pneumonia
# A tibble: 9 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 272
2 Left against medical advice 10
3 Transfer to short term facility 9
4 Transfer to long term facility 48
5 Home health care 96
6 Hospice care - home or medical facility 31
7 Other 87
8 Dead 87
9 <NA> 27
#visualizationggplot(bactpneumo_patients_ed, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("Bacterial penumonia cases by sex") +theme_bw()
ggplot(bactpneumo_patients_ed, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("Bacterial penumonia cases by age") +theme_bw()
ggplot(bactpneumo_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("Bacterial penumonia cases by discharge month") +theme_bw()
# A tibble: 9 × 2
DISCHARGE_STATUS n
<fct> <int>
1 Routine to home 204
2 Left against medical advice 7
3 Transfer to short term facility 4
4 Transfer to long term facility 17
5 Home health care 71
6 Hospice care - home or medical facility 18
7 Other 115
8 Dead 24
9 <NA> 32
#visualizationggplot(Cdiff_patients_ed, aes(x = SEX)) +geom_bar() +labs(x ="Sex", y ="Count") +ggtitle("C. difficile infection by sex") +theme_bw()
ggplot(Cdiff_patients_ed, aes(x = age.simplified)) +geom_bar() +labs(x ="Age", y ="Count") +ggtitle("C. difficile infection by age") +theme_bw()
ggplot(Cdiff_patients_ed, aes(x = DISCHARGE_MONTH)) +geom_bar() +scale_x_continuous(breaks =1:12) +labs(x ="Month", y ="Count") +ggtitle("C. difficile infection by discharge month") +theme_bw()
4 Results
Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the Section 3 as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.
Conclusions
5 Conclusion
This the conclusion. The Section 4 can be invoked here.